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CHARACTER AND VALUE OF 
STANDARDIZED TESTS IN HISTORY^ 



EARLE UNDERWOOD RUGG 
Oak Park High School, Oak Park, Illinois 



INTRODUCTION 

The quantitative movement in such school subjects as 
spelling, arithmetic, algebra, and handwriting has turned 
attention to the need for objective devices in the field of history. 
Complexity of subject-matter and differences in teacher- 
judgment make it desirable to construct devices that will 
enable one to score and mark various types of history exercises 
with more precision than is possible under the present exami- 
nation system. The purpose of this article is (i) to acquaint 
the teacher of history with the existing test material; (2) to 
point out in some detail the general features of the tests; (3) 
to describe some of their defects; and (4) to discuss the value 
of the movement. 

EXISTING TESTS IN THE FIELD 

At present there are eleven tests that aim to measure some 
phase of history. With few exceptions they have not been 
well standardized. Those that are merely in a tentative state 
are included, however, to indicate the general tendencies of 
the measuring movement in this subject. The writer presents 
in summary form the salient features of the eleven tests in 
the field (see pages 768-69). Only four of the authors of 
the above-mentioned tests have published their investigations. 



' Paper reported to history section at University of Chicago Conference of Secondary 
Schools, May, 191 9, and to social science section at High School Conference of Illinois, Novem- 
ber, 1 91 9. 
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GENERAL FEATURES 



A majority of the tests in history have made use primarily 
of facts or information of a historical nature; the assumption 
is that the readiness with which students answer factual 
questions is a measure of a thing called historical ability. 
The tests are, from the point of view of psychology, purely 
associational. For example, Sackett asks the student to name 
a writer, painter, orator, general, etc., noted in ancient history; 
then again he asks the significance of such terms as "The 
Battle of Tours," or "The Age of Pericles." Bell requires the 
pupil to give the important event occurring in 1861, 1789, 
1620, 1492, etc., to state such things as the important principle 
of each political party, and to name the great epochs or move- 
ments in American history. Harlan, Rayner, and Starch, in a 
different type called completion tests, demand that the student 
insert in spaces purposely left blank the correct responses to 
certain historical information. The material in these exercises 
is so arranged that the insertion of responses in these blanks 
forms a historical narrative. Davis has devised another type 
different from the two described above. He suggests several 
possible responses and asks the student to underline the answer 
that he deems to be correct; i.e., "The Mayflower was a hall, 
chapel, hotel, plant, ship." In general all of these thus far 
mentioned involve only elementary associational facts. 

The tests of Buckingham, Van Wagenan, Barr, and Rugg 
are more complicated, for in these the pupil is called upon to 
react to more intricate mental processes such as thought, 
reasoning, historical inference, and judgment. Buckingham 
made the first step beyond the testing of facts when he worked 
out an investigation which showed rather a marked correlation 
between the ability to think and the ability to remember in 
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history.! To ascertain this he gave a series of thought questions 
of which the following is a sample: 

For many years after the coming of Columbus, explorers wandered 
about in the forests of the new world, and paddled their canoes up and down 
the great rivers without thinking very seriously of colonization. What 
were they thinking about and what were they trying to do? 

Another sample of the thought questions is taken from the 
Van Wagenan, series B. 

A hundred years ago it took a letter several days to go from New York 
to Boston; today it takes only a few hours. Why do you think it took 
letters so much longer to go from New York to Boston one hundred years 
ago than now? 

Dr. Van Wagenan has introduced the judgment factor by 
including in his tests a series on "character judgment." In 
this type the student's conception of a personage is obtained 
by quoting a historic passage depicting some act of this person 
and then asking the student to underline out of eight or nine 
suggested adjectives three which best describe the character of 
that act. Thus, he quotes the rudeness of Secretary Stanton 
in tearing up a note from President Johnson presented by 
Mrs. Clay, wife of an imprisoned Confederate general. The pupil 
underscores three of the following that best describe Stanton's 
action: rude, callous, generous, courteous, tactful, cautious, 
thoughtful, sympathetic, insolent, and considerate. Barr in 
his diagnostic tests attempts to measure such things as histori- 
cal comprehension, inference, and constructive imagination. 
His plan is similar to Van Wagenan's in that he quotes a his- 
torical passage and asks responses in the form of questions. 
Preliminary results indicate the possibihty of these diagnostic 
tests, but, as Mr. Barr asserts, the material must be carefully 
revised and given to a much larger group before actual predic- 
tions can be made. Similarly, the writer has made a prelimi- 

> School and Society, V (April 14, 1917), 443-48. 
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nary study of historical judgment. He tested chronological, 
causal, and critical judgment. A tabulation of the results 
obtained from one hundred and sixteen students in Oak Park 
High School indicates the feasibility of measuring this factor. 
These tests give various classes of events such as social, 
political, and military. The pupil is asked to number 
them I, 1, 3, etc., in the order in which they appear from 
the point of view of time. Again in critical judgment he is 
asked to mark various types of historical books which will 
indicate the readiness with which he can distinguish texts, 
source accounts, biography, etc. Causal judgment seeks to 
discover the ability of the pupil to pick out the response of 
several suggested that relates best to the causative element. 
The problem is to show experimentally that history trains the 
judgment. The tests in this group are illustrations of the fact 
that the tendency is to attempt measurement of the more 
intricate historical outcomes. Summarizing, then, we find two 
distinct types of tests in the field of history: (i) those that 
make the ability to answer factual questions the primary end, 
and (2) those that are concerned with the measurement of the 
higher mental processes, namely, thought, reasoning, imagina- 
tion, inference, judgment. 

GENERAL CRITICISMS OF EXISTING TESTS 

I. Turning to an examination of the results secured from 
giving these factual tests in classroom work one finds that pupils 
do not retain a great deal of historical information.^ Therefore, 
the assumption that the readiness with which pupils answer 
historical questions measures historical ability should be, at 
least, qualified. In fact, the writer doubts the validity of the 
assumption because of its primary emphasis upon mere mem- 
ory.^ The dominant aim should be not to memorize historical 

^ This assertion is made after the following tests were tried out in my own classes in the 
Oak Park High School — Bell, Sackett, Starch, Harlan, and Davis. 

> G. C. Myers, "Delayed Recall in History," Journal of Educational Psychology, VIII 
(May, 1917), 275. 
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content but to give the child as wide an experience with the 
world as possible. Dr. Bobbitt in his book on The Curriculum 
points out that we must not hold the child for detailed facts. 
He urges that the child be permitted to absorb through wide 
reading as varied and vicarious an experience as is possible to 
obtain. Dr. Horn, another curriculum maker, corroborates 
this point of view in support of the theory of social utility. 
This is his criterion on which to construct the course of study. 
It means that the course must be devised to meet the needs of 
the child, either in future school life or as an adult. Con- 
sidered from this aspect, much of the content included in the 
factual tests is obsolete. This standard of social utility would 
also cast doubt upon the validity of a theory where facts are 
held to be the chief end. Even more conclusive are the actual 
experimental investigations reported by some of the authors 
of these tests. Thus, Bell found that 668 high-school pupils 
retained only 2,2, per cent of the historical information called for 
in his test. Five hundred and fifty pupils in the three upper 
grades of the elementary school could answer only 16 per cent of 
the same set of questions. It should be noted, however, that 
detailed facts, such as those found in the tests discussed, are 
not in themselves of value. These facts must be the means of 
arriving at an understanding of the structure of society. Advo- 
cates of the social studies today desire that the child obtain 
an appreciation and understanding of his environment. They 
demand that the world be made socially intelligible to him. 
Historical facts should be but the media of arriving at this end. 
2. Many of the tests are faulty because they do not embrace 
content vital to the course of study. Some of these authors 
have shown unfamiliarity with the recent tendencies in this 
field in that they have included content that is not taught at 
present. Progressive teachers are agreed that the present must 
be stressed to a far higher degree than it has been in past years. 
Thus ancient history as a required subject appears to be 
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doomed.' Moreover, the tendency is now that the child should 
not be held for the detailed facts in a text. Hence, details such 
as are found in the above tests should be omitted. Such 
questions as. Who was Mithridates ? What was the date of 
the invasion of the Saracens into Spain ? or When was the New 
Haven colony founded? are isolated facts that fail to aid the 
child in understanding modern society. 

However, to be constructive in respect to vital content we 
must ask. What historical material is vital to the course of 
study? and What shall be our criterion for determining its 
value? First, content is vital to the course of study that gives 
the child an appreciation of the structure of modern society. 
This does not mean that all history must be necessarily recent. 
An appreciation of this age is obtained partly from a study of 
the development of things in the past. One must become 
conscious of the evolution of civilization — that society is an 
organism and is growing. But content to be included must 
have definite relationship to things of the present time. Sec- 
ondly, our criterion should be the social needs of the child either 
in school or as an adult. The course should represent the most 
important of the needs of the community selected from the 
habits, ideals, skills, information, etc., found by an analysis 
of the community.^ The importance, frequency of occurrence, 
or the cruciality of such needs will determine largely what 
material is desirable. For example, an analysis of political 
platforms of parties has brought to light recurring problems 
such as the tariff, finance, interstate commerce, and immigra- 
tion. Certainly these must be known in order that the child 
understand things about him today. On the other hand, 
knowledge of how to restore respiration in case of drowning is 
an example of including content on the basis of cruciality. 

' See report of the Committee on History and Education for Citizenship, Historical Out- 
ook. May and June, 1919. This committee recommends four years of social studies, two of 
which shall be history. In the tenth year modern history shall be given, at least one semester 
of which shall be from 1789 to date. The other year shall be American history in its broader 
sense, to come in the eleventh year. 

'The writer is indebted to Professor Horn for this theory of social utility. 
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Future tests are concerned with this problem because such 
exercises must measure content that is of fundamental impor- 
tance to the child. 

3. A majority of the exercises do not test the basic aims or 
outcomes of history. True it is that we have not as yet shown 
experimentally what are valid aims. Tabulation of some sixty 
books, articles, and courses of study, as well as two question 
blank investigations which discuss aims of the subject, may be 
grouped as follows: (i) facts; (2) training for citizenship; 

(3) training certain powers such as imagination and judgment; 

(4) inculcating within the child a sense of patriotism; (5) 
broadening the pupil's point of view; (6) training in seeing 
causal relationship. Moreover, training in sound habits of 
study should be included. It is obvious that these are opinions 
which in some instances could be called aims of the other school 
subjects. They are indicative of what history teachers have in 
mind as objectives. The writer accepts them tentatively as the 
basis of tangible outcomes — at least, until we can prove them. 
We must, however, have objective evidence on the questions 
whether history does train the judgment, whether it creates 
sound habits of study, and whether it broadens one's point of 
view. It is also his conviction that in presenting various phases 
of the course one must keep constantly in mind some particu- 
lar aim or purpose. Only in this way can one tell with any 
degree of accuracy what results are being achieved. The entire 
methodology hinges on this point. Construction of standard- 
ized tests — examinations — for each aim is the only objective 
means of ascertaining exactly how many of the above or of other 
asserted aims are practicable. It seems that few of the writers 
of the tests under review were conscious of this fundamental 
problem. We have seen that several are built upon the assump- 
tion that ability to answer historical information portrays 
historical ability. I have already commented upon the inva- 
lidity of this theory. Also it has been pointed out that several 
have attempted to measure other outcomes aside from facts. 
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The problem before those interested in the testing movement 
in this field is to demonstrate what are the objectives of his- 
torical instruction.! Then will quantitative evidence prove to 
us what we should aim to do in teaching this subject. 

4. Another defect of these exercises is that a majority of 
them include historical material covering the entire course, 
i.e., in American history from 1492 to the present. This 
makes the administration of the tests before the end of the 
school year impossible. Dr. Davis has sought to remedy this 
trouble by using material only on one period. He takes the 
colonial period. It seems to me that this is very fundamental. 
All testing to be valuable must be done at the crucial time. The 
teacher is concerned week by week, month by month, with the 
fact of how much of the course is grasped by the students. 
In fact, examinations are usually drawn up to test comprehen- 
sion of a given epoch or movement. Periodic tests are but 
examinations that are standardized; hence the desirability of 
obtaining objective evidence as to the degree that a given 
period is understood. This plan is preferable because one must 
mark the pupils for home reports periodically and because it 
aids in determining pupil and class difficulties. Of course, one 
must also devise a test covering the entire course. This latter 
is essential for the final review. 

5. Most of these exercises discussed here are so brief from 
the point of view of content that where available or known to 
the teacher of history for any length of time it would be almost 
inevitable that such material would be stressed in presenting 
the work day by day. It is obvious that results obtained in 
this manner would be valueless. To forestall this possibility 
each test must be designed so that it will include several sheets 
of material. For example, they can be labelled series A, B, C, 
etc. Question i on series A can be made to compare in type 



1 It should be noted that most of these investigations reported are given in such technical 
language that they are beyond the grasp of the history teacher not trained in statistical inter- 
pretation. On the contrary, these investigations should be reported in simple language. 
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and in difficulty with question i on the other series. Similarly 
each succeeding question will involve material of the same 
degree of difficulty, series by series. Such a plan will make the 
exercise a test of ability, not an "exhibition" performance. 

6. Several of the exercises under review may be criticized 
from the point of view of organization. This fault is particu- 
larly evident in the completion tests. They, in the main, are 
concerned merely with whether the pupil can insert battles, 
dates, events, personages, and place locations in the blank 
spaces provided. In short, they stress facts as ends in them- 
selves, but we have pointed out that facts should be used as 
means to an end. Moreover, in the completion exercises guess- 
work is likely to creep in. Experimental Investigation shows 
that students make a low percentage of correct associations 
with historical material involving time sequence and place 
location. Where these elements are mixed in with personages, 
political events, etc., the student cannot aistinguish the type of 
response desired. Hence, it is essential that the organization as 
to type of answer required be clear-cut. For example, the 
blanks involving time sequence should be marked off" distinctly 
from those involving battles, place location, political events, 
and social movements. One can still retain the narrative form 
for the test, but its organization is more easily grasped by the 
pupil by following the suggested plan. 

7. These exercises have not been scored or graded in such 
a way that they comprise an accurate guide in marking the 
papers of the teacher of history. Her time is limited and if she 
is to be urged to use such tests, the problem of scoring must be 
made simple and accurate. We know fro.n various investiga- 
tions that teachers vary widely on the question of grading 
papers accurately. These studies have shown how difficult it 
is for even the conscientious teacher to mark accurately. 
Particularly is it hard to do in the complex subjects like history 
where judgment is intricate and opinion almost bound to creep 
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in. Familiarity with the theoretical distribution of pupil 
abilities according to the normal curve, or even the use of the 
best papers in the class as samples, will not enable one to deter- 
mine exactly whether question i should receive the same per- 
centage as succeeding questions. In fact, the probability is 
that some questions are much harder than others. Therefore, 
to grade with precision the teacher needs to know that a ques- 
tion is worth, three, five, or eight points because several hun- 
dreds or even thousands of pupils have answered it with that 
average in the past. It has been demonstrated in other school 
subjects that standardized tests can be constructed so that the 
value of each question is known and is printed along beside 
the question. 

On straight information questions it is relatively easy to 
assign scores or marks. However, with the more intricate 
questions where opinion and evidence on the point disagree, 
the response is infinitely more complicated. Here the score is 
likely to be only approximately accurate. For example, one 
would obtain many varying opinions from a class on the ques- 
tion. Why did the English colonial policy succeed while that of 
the French failed? where one merely asked for a brief written 
answer. The problem of ascertaining which of these answers is 
true is colored by many considerations. Most important of all 
is the problem of scoring various answers, assigning to each its 
proper credit. Therefore, I sought to avoid this difficulty in 
testing causal judgment by making a positive statement as the 
causative element and then suggesting several responses 
indicative of possible effects. It should be noted that each of 
the questions had been scored by the pupils in a preliminary 
way to determine relative difficulty. With the arrangement 
outlined above, the pupil was asked to check the answer that 
he deemed showed the best causal relationship. For example: 

The English colonial policy succeeded while the French policy failed 
because 
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1. The English were better farmers than the French. 

2. The climate in the English colonies was better. 

3. The natural resources were more suitable. 

4. The English pursued a policy of permanent settlement. 

5. The English government at home was better. 

This checking plan of grading the more complex outcomes 
possesses value because it excludes varying responses difficult 
to grade correctly, and because it is possible to tabulate the 
answers more quickly and accurately. Such tabulation diagno- 
ses pupil and class difficulties. It is, then, by using the above- 
mentioned plan that the teacher will secure exact standards for 
grading papers. Moreover, she will be enabled to mark the 
papers more easily and in far less time than under the present 
system, thus cutting down the rather arduous paper work. 

VALUE OF THE TESTING MOVEMENT IN HISTORY 

In closing, the writer wishes to stress the value of these 
standardized exercises in the field of history. One must admit 
that there are obvious limitations to their use. Moreover, there 
will be many who say that we cannot measure such complex 
processes as those found in the study of history. However, it 
is the conviction of the writer that our judgment and grading 
of pupil reaction will, at least, be refined through the use of 
these tests. They are valuable, first, to check the basic aims 
and outcomes of this branch of the social sciences. It has been 
my contention throughout this article that a final statement of 
aims and outcomes will not be established until experimental 
evidence proves the temporary statement practicable. The 
wide disagreement, together with the vague commonplaces 
found in the articles on aims and outcomes of history, indicates 
that there is no real evidence yet to point to what the history 
teacher should strive to do or what the pupil should strive to 
attain. However, no teacher ought to be permitted to go 
through a course without being forced constantly to ponder, at 
least, on such questions: Why should this subject be taught? 
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Writer intends to standardize questions so that percentages may 
be assigned each question. 
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Of what use is this bit of content? Does it function in the 
needs of the pupils ? By what means can I discover whether it 
is of value ? Attempts to answer these questions will unearth 
shortcomings and deficiencies. Then will the critical teacher 
seek some means to eliminate them. Tests will be found to be 
objective devices to check up valid aims and outcomes. 

A second use of these devices is the improvement that they 
will bring to classroom instruction. As mentioned in the pre- 
ceding paragraph, attention to shortcomings tends to cause 
their removal. But the mere realization that something is 
wrong, for example, that the pupils fail to grasp time sequence, 
is of no consequence unless the teacher has some objective 
means which will reveal these defects clearly. A tabulation of 
the performances of a class that has taken the test will show not 
only individual but also class difficulties. ' It was by the 
method outlined in the footnote that the writer ascertained 
just what points in the tests under review in this article were 
hardest. It was shown that the classes had no clear conception 
of time sequence and of place geography. Knowing this, my 
classroom instruction was changed so that these points would 
receive especial attention. Moreover, it enabled me to diagnose 
the points of difficulty of the individual pupil. The work of the 
teacher should consist largely in showing students how to 
study. Still experiments in supervised study have demon- 
strated that many teachers fail in this part of their work. The 
reason commonly assigned is that the teacher, presumed to 
know correct habits of study, cannot introspect to the extent of 
being able to tell others how she herself studies. There are 
those, also, who cannot do it because they themselves have poor 
study habits. Trite phrases such as "fails to grasp subject" or 
"works irregularly" on report cards are of no value to the 

' By using cross-section paper one may tabulate results of a test quickly and accurately. 
Write student's name on horizontal lines at left of paper, the question at top, vertically. This 
will enable one to tell what an individual pupil has accomplished and also will reveal class 
difficulties. 
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supervisor, principal, parent, or even the pupil himself in 
helping any plan that any of the above may wish to use in 
solving this fault. But to state clearly and in some detail 
concrete things to do as means of grasping a subject will help to 
eliminate the above phrases. Showing pupils how to study is 
the primary function of the teacher. Therefore, she should not 
be content until she has at her command every method, bit of 
technique, tool, and device that will enable her to teach stu- 
dents how to study. Standardized history tests or examina- 
tions have their place in this connection. 

To summarize their value, tests in the field of history will 
improve classroom instruction because they can be given and 
scored easily; they are objective, being based upon the per- 
formances of enough pupils to enable one to foretell the 
percentage of correct answers to each question; they reveal 
class and individual differences, not only aiding in shaping 
the review work to some definite end, but also being of service 
in showing pupils how to study. Moreover, when designed 
on the principle of social utility — i.e., to include only content 
which is of proved social worth to the child — they will tend 
to organize the course of study around the essential experiences. 

SUMMARY 

This article has attempted to present considerations con- 
cerning (i) the existing tests in the field of history; (2) general 
criticisms of them; and (3) their value to the lay teacher of 
history. Such exercises are but a start toward the solution of 
the problem, for they are not well standardized as yet, nor are 
they organized around the vital content of the course. How- 
ever, they are of value in that they point the way by showing 
us the method of attacking the problem and by indicating that 
such devices possess utility to the progressive teachers of 
history because such exercises enable one to test the aims and 
outcomes of this subject, and because they will aid in improv- 
ing classroom instruction in the manner outlined above. 



